We study a generalization of the setting of regenerating codes, motivated byapplications to storage systems consisting of clusters of storage nodes. Thereare $n$ clusters in total, with $m$ nodes per cluster. A data file is coded andstored across the $mn$ nodes, with each node storing $\alpha$ symbols. Foravailability of data, we require that the file be retrievable by downloadingthe entire content from any subset of $k$ clusters. Nodes represent entitiesthat can fail. We distinguish between intra-cluster and inter-cluster bandwidth(BW) costs during node repair. Node-repair in a cluster is accomplished bydownloading $\beta$ symbols each from any set of $d$ other clusters, dubbedremote helper clusters, and also up to $\alpha$ symbols each from any set of$\ell$ surviving nodes, dubbed local helper nodes, in the host cluster. Wefirst identify the optimal trade-off between storage-overhead and inter-clusterrepair-bandwidth under functional repair, and also present optimal exact-repaircode constructions for a class of parameters. The new trade-off is strictlybetter than what is achievable via space-sharing existing coding solutions,whenever $\ell > 0$. We then obtain sharp lower bounds on the necessaryintra-cluster repair BW to achieve optimal trade-off. Under functional repair,random linear network codes (RLNCs) simultaneously optimize usage of bothinter- and intra-cluster repair BW; simulation results based on RLNCs suggestoptimality of the bounds on intra-cluster repair-bandwidth. Our bounds revealthe interesting fact that, while it is beneficial to increase the number oflocal helper nodes $\ell$ in order to improve thestorage-vs-inter-cluster-repair-BW trade-off, increasing $\ell$ not onlyincreases intra-cluster BW in the host-cluster, but also increases theintra-cluster BW in the remote helper clusters.
展开▼
机译:我们研究了再生代码设置的一般性,其归因于应用程序对由存储节点集群组成的存储系统的应用。总共有$ n $个群集,每个群集有$ m $个节点。数据文件在$ mn $节点之间进行编码和存储,每个节点存储$ \ alpha $符号。为了获得数据,我们要求可通过从$ k $群集的任何子集下载全部内容来检索文件。节点代表可能失败的实体。我们在节点修复期间区分集群内和集群间带宽(BW)成本。集群中的节点修复是通过从$ d $个其他集群中的每个$ \ beta $符号,被称为远程帮助集群以及每个生存的$ \ ell $组中的每个$ \ alpha $符号中下载来完成的,在主机群集中称为本地帮助程序节点。我们首先确定功能修复下存储开销与集群间修复带宽之间的最佳折衷,并针对一类参数给出最佳的精确修复码构造。当$ \ ell> 0 $时,新的权衡要比通过空间共享现有编码解决方案所能达到的更好。然后,我们在必要的群集内修复带宽上获得了尖锐的下限,以实现最佳折衷。在功能维修中,随机线性网络代码(RLNC)同时优化群集间和群集内维修带宽的使用;基于RLNC的仿真结果表明,群集内修复带宽范围的优化。我们的界限揭示了一个有趣的事实,尽管增加本地辅助节点$ \ ell $的数量以改善存储与群集间-修复-BW的权衡是有益的,但是增加$ \ ell $不仅会增加内部群集在主机群集中的带宽,但也会增加远程帮助程序群集中的群集内部带宽。
展开▼